Web and Corpus Methods for Malay Count Classifier Prediction

نویسندگان

Jeremy Nicholson

Timothy Baldwin

چکیده

We examine the capacity of Web and corpus frequency methods to predict preferred count classifiers for nouns in Malay. The observed F-score for the Web model of 0.671 considerably outperformed corpus-based frequency and machine learning models. We expect that this is a fruitful extension for Web–as–corpus approaches to lexicons in languages other than English, but further research is required in other South-East and East Asian languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Count Classifier Preferences of Malay Nouns

We develop a data set of Malay lexemes labelled with count classifiers, that are attested in raw or lemmatised corpora. A maximum entropy classifier based on simple, languageinspecific features generated from context tokens achieves about 50% F-score, or about 65% precision when a suite of binary classifiers is built to aid multi-class prediction of headword nouns. Surprisingly, numeric feature...

متن کامل

Corpus Design for Malay Corpus-based Speech Synthesis System

Problem statement: Speech corpus is one of the major components in corpus-based synthesis. The quality and coverage in speech corpus will affect the quality of synthesis speech sound. Approach: This study proposes a corpus design for Malay corpus-based speech synthesis system. This includes the study of design criteria in corpus-based speech synthesis, Malay corpus based database design and the...

متن کامل

A cross-cultural study of request speech act: Iraqi and Malay students

Several studies have indicated that the range and linguistics expressions of external modifiers available in one language differ from those available in another language. The present study aims to investigate the cross-cultural differences and similarities with regards to the realization of request external modifications. To this end, 30 Iraqi and 30 Malay u...

متن کامل

Economic Prediction using Heterogeneous Data Streams from the World Wide Web

Learning to predict financial and economic variables of interest is a hard problem with a large body of literature devoted to it. Of late there has been a significant amount of work on using sources of text from the Web (such as Twitter or Google Trends) to predict financial and economic variables. Much of this work has relied on some form or other of superficial sentiment analysis to represent...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Web and Corpus Methods for Malay Count Classifier Prediction

نویسندگان

چکیده

منابع مشابه

Learning Count Classifier Preferences of Malay Nouns

Corpus Design for Malay Corpus-based Speech Synthesis System

A cross-cultural study of request speech act: Iraqi and Malay students

Economic Prediction using Heterogeneous Data Streams from the World Wide Web

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

عنوان ژورنال:

اشتراک گذاری